28 research outputs found

    Document Logical Structure Analysis Based on Perceptive Cycles

    Get PDF
    The original publication is available at "http://www.springerlink.com/"International audienceThis paper describes a Neural Network (NN) approach for logical document structure extraction. In this NN architecture, called Transparent Neural Network (TNN), the document structure is stretched along the layers, allowing an interpretation decomposition from physical (NN input) to logical (NN output) level. The intermediate layers represent successive interpretation steps. Each neuron is apparent and associated to a logical element. The recognition proceeds by repetitive perceptive cycles propagating the information through the layers. In case of low recognition rate, an enhancement is achieved by error backpropagation leading to correct or pick up a more adapted input feature subset. Several feature subsets are created using a modified filter method. The first experiments performed on scientific documents are encouraging

    Segmentation de documents composites par une technique de recouvrement des espaces blancs

    Get PDF
    International audienceWe present here a method for the segmentation of composite documents. Unlike most publications, we focus on non-Manhattan layouts which are usually created by compositing. Therefore, the pages to be processed contain several sub-documents which have to be isolated. We draw inspiration from the white space cover technique introduced by Baird et al. and a suite of pre- and post-processings specific to these particular documents. The evaluations are made on administrative records coming from various sources and provided to us by our industrial partner. As we do not have any groundtruth documents we compared our results with those obtained by a commercial OCR which is outperformed by our method.Nous présentons dans cet article une méthode pour la segmentation de documents composites. Contrairement à la majorité des publications, nous nous focalisons sur des documents à structure non-Manhattan qui sont généralement créés par montage. Les pages à traiter contiennent donc plusieurs sous-documents qu'il faut isoler. Nous nous inspirons d'une technique par recouvrement d'espaces blancs proposée par Baird et al. ainsi qu'une suite de pré-traitements et post-traitements spécifiques à ces documents particuliers. Les évaluations sont faites sur des documents administratifs d'origines diverses qui nous sont fournis par une société partenaire. Ne disposant pas de documents de vérité, nous avons comparé nos résultats à ceux d'OCR commerciaux que notre méthode surpasse

    Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

    Get PDF
    Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.Comment: Proceedings of the Seventh IEEE Conference on Innovative Smart Grid Technologies (ISGT 2016

    Reconnaissance de structures logiques par un réseau de neurones transparent

    Get PDF
    International audienceDans cet article, nous présentons une nouvelle approche à base de réseaux de neurones pour la reconnaissance de structures logiques de documents. Le système utilisé, appelé Réseau de Neurones Transparent, allie à la fois une approche dirigée par les données grâce à son apprentissage, et une approche dirigée par le modèle à l'aide d'une intégration dans sa topologie d'un modèle de classe. La reconnaissance se fait à travers des cycles d'interprétation-extraction avec une correction des entrées, permettant une classi- fication des formes les plus ambiguës. L'application de ce système est effectuée sur une base de 74 articles scientifiques et un gain de 10 points de reconnaissance est obtenu en comparaison avec une approche classique (91.7% sur 21 structures logiques contre 81.6%). Après un rapide survol des différentes méthodes que nous propose la littérature, nous nous attacherons à la description de l'architecture du système. Puis, nous évoquerons sa capacité d'autocorrection lors de cycles perceptifs. Nous proposerons ensuite une méthode de partitionnement des données afin de réduire considérablement la tâche d'extraction de primitives physiques, et pour finir, nous présenterons les résultats et les perspectives obtenus par un tel système

    Improved CHAID Algorithm for Document Structure Modelling

    Get PDF
    International audienceThis paper proposes a technique for the logical labelling of document images. It makes use of a decision-tree based approach to learn and then recognise the logical elements of a page. A state-of-the-art OCR gives the physical features needed by the system. Each block of text is extracted during the layout analysis and raw physical features are collected and stored in the ALTO format. The data-mining method employed here is the \Improved CHi-squared Automatic Interaction Detection" (I-CHAID). The contribution of this work is the insertion of logical rules extracted from the logical layout knowledge to support the decision tree. Two setups have been tested; the rst uses one tree per logical element, the second one uses a single tree for all the logical elements we want to recognise. The main system, implemented in Java, coordinates the third-party tools (Omnipage for the OCR part, and SIPINA for the I-CHAID algorithm) using XML and XSL transforms. It was tested on around 1000 documents belonging to the ICPR'04 and ICPR'08 conference proceedings, representing about 16,000 blocks. The nal error rate for determining the logical labels (among 9 dierent ones) is less than 6%

    Séparation manuscrit et imprimé dans des documents administratifs complexes par utilisation de SVM et regroupement

    Get PDF
    International audienceThis paper proposes a methodology for the segmentation of printed and handwritten zones in document images. The documents are mainly of administrative type in an unconstrained industrial framework. We have to deal with a large number each day. They can come from different clients so as to their content, layout and digitization quality vary a lot. The goal is to isolate handwritten notes from the other parts, in order to apply in a second time some dedicated processing on the printed and the handwritten layers. To achieve that, we propose a four step procedure: preprocessing, geometrical layout analysis at pseudo-word level, classification using a SVM, then post-correction with context integration allowing a better quality. The classification rates are around 90% for segmenting printed, handwritten and noisy zones.Cet article propose une méthodologie pour la séparation de l'imprimé et du manuscrit dans des images de documents. Les documents à traiter sont majoritairement de type administratif dans un environnement industriel sans contrainte, à savoir une masse quotidienne et importante de pages à traiter avec une grande diversité de contenu et de qualité de numérisation. L'objectif est d'isoler toutes les annotations manuscrites afin d'effectuer par la suite des traitements spécifiques sur le plan du manuscrit et sur le plan de l'imprimé. Nous proposons une solution en plusieurs étapes qui sont: un prétraitement des images, une segmentation du contenu en "pseudo-mots", une reconnaissance par séparateur à vaste marge de la classe d'appartenance, puis une post-correction utilisant le contexte pour affiner la segmentation. Les résultats obtenus sont de l'ordre de 90% de bonne séparation entre l'imprimé, le manuscrit et le bruit

    A Fast Learning Strategy Using Pattern Selection for Feedforward Neural Networks

    Get PDF
    http://www.suvisoft.comInternational audienceIntelligent pattern selection is an active learning strategy where the classifiers select during training the most informative patterns. This paper investigates such a strategy where the informativeness of a pattern is measured as the approximation error produced by the classifier. The algorithm builds the training corpus starting from a small randomly chosen initial dataset and new patterns are added to the learning corpus based on their error sensitivity. The training dataset expansion is based on the selection of the most erroneous patterns. Our experimental results on MNIST 1 separated digit dataset show that only 3.26%of training data are sufficient for training purpose without decreasing the performance (98.36%) of the resulting neural classifier

    Automatic indexing and reformulation of ancient dictionaries

    Get PDF
    International audienceThis paper is related to automatic indexing and reformu-lation of ancient dictionaries. The objective is to make easy the access to ancient printed documents from XVI to XIX century for a diversified public (historians, scien-tists, librarians, etc.). Since the facsimile mode is insuffi-cient, the aim is to look further for the use of the index-ing based on the formal structure representative of some contents in order to optimize their exploration. Starting from a first indexing experiment operated on more recent documents, the TLF (Trésor de la Langue Française: Treasure of the French Language) in the ATILF labora-tory (Nancy, France), we extended the indexing tech-nique to automatic reformulation and reedition of ancient dictionaries. However, face to the problem extent, we limited our investigations to a very specific collections of the ATILF laboratory, the Trévoux dictionary (defined later)

    Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

    Get PDF
    Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets

    RĂ©seau de neurones dynamique perceptif - Application Ă  la reconnaissance de structures logiques de documents

    No full text
    Logical structure extraction of documents remains a challenging problem due to their inherent complexity and the gap between the physical features extracted from the image and their corresponding logical interpretation. Most of the literature proposes model-driven approaches which are not enough generic to handle complex and noisy documents. They do not use intermediate interpretation steps and do not explain the relationships between the physical blocks and the corresponding logical labels. The main objective of the thesis is to develop a hybrid method, using both data-driven and model-driven approach, which is capable to learn the relationships and simulate human perception during the logical recognition task. We have proposed a Dynamic Perceptive Neural Network which can handle drawbacks of previous systems. Four main points have been developed: - a special network topology based on local representation where the knowledge can be integrated in. The logical interpretation is unfolded along the layers of the network and a training stage is performed to find the weights for each link; - perceptive cycles (several bottom-up and top-down processes) perform the recognition. The network is able to generate hypothesis, validate them and detect ambiguous patterns. The context manages the correction of the input features to improve the recognition rate; - an input feature clustering has been proposed to speed-up the recognition. Subsets of features are automatically computed and are given progressively to feed the network in order to adapt the amount of computations according to the pattern complexity; - dynamic integration in the network that make it possible to integrate the data correction information during the training stage to have more appropriate behavior during the recognition. The improvement uses a Time Delay Neural Network architecture to take into account the input data variations after each perceptive cycle while the recognition step is quite similar to the static one.L'extraction de structures logiques de documents est un défi du fait de leur complexité inhérente et du fossé existant entre les observations extraites de l'image et leur interprétation logique. La majorité des approches proposées par la littérature sont dirigées par le modèle et ne proposent pas de solution générique pour des documents complexes et bruités. Il n'y a pas de modélisation ni d'explication sur les liens permettant de mettre en relation les blocs physiques et les étiquettes logiques correspondantes. L'objectif de la thèse est de développer une méthode hybride, à la fois dirigée par les données et par le modèle appris, capable d'apprentissage et de simuler la perception humaine pour effectuer la tâche de reconnaissance logique. Nous avons proposé le Réseau de Neurones Dynamique Perceptif qui permet de s'affranchir des principales limitations rencontrées dans les précédentes approches. Quatre points principaux ont été développés : - utilisation d'une architecture neuronale basée sur une représentation locale permettant d'intégrer de la connaissance à l'intérieur du réseau. La décomposition de l'interprétation est dépliée à travers les couches du réseau et un apprentissage a été proposé pour déterminer l'intensité des liaisons ; - des cycles perceptifs, composés de processus ascendants et descendants, accomplissent la reconnaissance. Le réseau est capable de générer des hypothèses, de les valider et de détecter les formes ambigües. Un retour de contexte est utilisé pour corriger les entrées et améliorer la reconnaissance ; - un partitionnement de l'espace d'entrée accélérant la reconnaissance. Des sous-ensembles de variables sont créés automatiquement pour alimenter progressivement le réseau afin d'adapter la quantité de travail à fournir en fonction de la complexité de la forme à reconnaître ; - l'intégration de la composante temporelle dans le réseau permettant l'intégration de l'information de correction pendant l'apprentissage afin de réaliser une reconnaissance plus adéquate. L'utilisation d'un réseau à décalage temporel permet de tenir compte de la variation des entrées après chaque cycle perceptif tout en ayant un fonctionnement très proche de la version statique
    corecore